Visualization of variable binding pockets on protein surfaces by probabilistic analysis of related structure sets
نویسندگان
چکیده
Background: Protein structures provide a valuable resource for rational drug design. For a protein with no known ligand, computational tools can predict surface pockets that are of suitable size and shape to accommodate a complementary small-molecule drug. However, pocket prediction against single static structures may miss features of pockets that arise from proteins’ dynamic behaviour. In particular, ligand-binding conformations can be observed as transiently populated states of the apo protein, so it is possible to gain insight into ligand-bound forms by considering conformational variation in apo proteins. This variation can be explored by considering sets of related structures: computationally generated conformers, solution NMR ensembles, multiple crystal structures, homologues or homology models. It is non-trivial to compare pockets, either from different programs or across sets of structures. For a single structure, difficulties arise in defining particular pocket’s boundaries. For a set of conformationally distinct structures the challenge is how to make reasonable comparisons between them given that a perfect structural alignment is not possible. Results: We have developed a computational method, Provar, that provides a consistent representation of predicted binding pockets across sets of related protein structures. The outputs are probabilities that each atom or residue of the protein borders a predicted pocket. These probabilities can be readily visualised on a protein using existing molecular graphics software. We show how Provar simplifies comparison of the outputs of different pocket prediction algorithms, of pockets across multiple simulated conformations and between homologous structures. We demonstrate the benefits of use of multiple structures for protein-ligand and protein-protein interface analysis on a set of complexes and consider three case studies in detail: i) analysis of a kinase superfamily highlights the conserved occurrence of surface pockets at the active and regulatory sites; ii) a simulated ensemble of unliganded Bcl2 structures reveals extensions of a known ligand-binding pocket not apparent in the apo crystal structure; iii) visualisations of interleukin-2 and its homologues highlight conserved pockets at the known receptor interfaces and regions whose conformation is known to change on inhibitor binding. Conclusions: Through post-processing of the output of a variety of pocket prediction software, Provar provides a flexible approach to the analysis and visualization of the persistence or variability of pockets in sets of related protein structures. Background The availability of a protein’s 3D structure may provide insight into its mechanism and a basis for rational design of small molecule modulators of its function. Key to understanding and modifying function in many proteins is knowledge of the structure of potential binding sites. Rational drug design strategies may then be used to design small molecules that bind to complementary features of such sites. In the absence of a structure containing a ligand, computational tools allow prediction of small molecule binding sites by scanning the protein’s surface for pockets. These pockets must at least be of a size and shape that allows a ligand to bind with suitable specificity and affinity. Existing computational tools use a variety of methods to identify pockets, * Correspondence: [email protected]; [email protected]. uk Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, UK Full list of author information is available at the end of the article Ashford et al. BMC Bioinformatics 2012, 13:39 http://www.biomedcentral.com/1471-2105/13/39 © 2012 Ashford et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. the simplest are based on local geometry and include PASS [1], LIGSITE [2], Pocket [3], PocketPicker [4], SURFNET [5], CAST [6] and fpocket [7]. Additional properties can be employed in pocket prediction, for example, LIGSITE-csc [8] and Concavity [9] combine structural information with sequence conservation scores, and Q-Site Finder [10] considers the energy of binding of hydrophobic probes to the protein’s surface. It has been shown that the application of these tools to the analysis of individual static structures is useful in identifying a primary binding site [11,12], such as an enzyme’s active site. However, proteins in solution are dynamic entities that explore conformational space over time due to side chain motions, local backbone flexibility and larger sub-domain or domain motions [13]. It follows that predictions of pockets based on single static structures may fail to detect potential binding sites, or features of such sites, that result from changes in their shape and size over time. The conformational selection hypothesis posits that bound conformations of proteins are often observed as transiently populated, high freeenergy conformations of the apo protein. Ligand binding simply lowers the free-energy of the binding-capable conformation, thus increasing the probability and population of this state [14]. It is thus supposed that some points of the conformational space dynamically explored by the apo protein correspond to the pocket conformation of a ligand-bound form. Therefore, we can expect to gain insight into binding-capable pockets from inspecting conformational variants of proteins. Sets of variants can be derived from several sources: simulated ensembles created using Molecular Dynamics (MD) [15,16], Essential Dynamics (ED) [17], Normal Mode Analysis (NMA) [18] or constraint-based methods such as CONCOORD [19] and tCONCOORD [20]; solutionNMR conformational ensembles; multiple structures of the same protein solved in different crystal forms, or with different ligands or experimental conditions. It has also been shown that the structure-space explored within sets of homologues correlates with that observed with MD simulations [21], consequently homologous superfamilies of proteins provide other potentially useful sets of variant structures. Difficulties comparing predicted pockets between different programs or across related structures Given a set of related protein structures, how do we compare their pockets and the variation within the set? An approach is to designate particular pockets, ‘Pocket A’, ‘Pocket B’, etc., and in each case perform some detailed analysis of the pocket’s geometry and other characteristics. This can be effective with a single, highly-conserved pocket, such as an enzyme’s active site [22]. However, a problem is that it is not clear how to consistently and unambiguously define each pocket in terms of its boundaries in cases where there are many possible sites i.e. are neighbouring pockets best considered as two distinct entities, or as part of a single contiguous whole (for discussion see [23,24])? An additional complication is that for any given single structure we can obtain different predicted pockets depending on which software we choose to run. A simple case of two different programs’ (PASS and LIGSITE) outputs for a single structure of human interleukin-2 (IL-2) is illustrated in Figure 1A, which shows only partial overlap of the clusters of pocket points for the two programs. In comparing pockets predicted in homologues, the issue is to identify whether apparent differences in pocket locations are simply a result of problems with structural alignment (Figure 1B). The visual comparison of pockets in diverse conformations of a single protein can also be affected by difficulties with alignment. Figure 2 illustrates how the spread of PASS predicted pocket points increases with the number of conformations used, an effect caused both by local structural variations and differences in global orientation required to produce the best overall alignments of the structures. Analysing a large set of conformers may help identify structural variation in otherwise persistent pockets, and transient pockets that are observed in only some members of the set. Finding evidence of variable or transient pockets may suggest novel targets and provide a useful adjunct to rational drug design strategies. Recent analyses have addressed the evidence for transient pockets by applying PASS to snapshots of MD trajectories of a member of the B-cell lymphoma family (Bcl-XL), human IL-2 and mouse double minute 2 (MDM-2) [25] and identifying pockets across snapshots by clustering on pocket volume and the overlapping pocket-lining residues, finding several distinct transient pockets in these systems. In a follow-up study, tCONCOORD produced comparable pockets to the MD simulations for Bcl-XL and MDM-2 [26]. Another recent approach uses the fpocket prediction algorithm, with a specific front-end for analysis of MD ensembles and trajectories (MDpocket [27]), to understand how pockets change across many different conformations by mapping pocket predictions for each structure onto a fixed spatial grid, calculating the frequency of occurrence of a pocket at each grid point, and visualising this as a density map superimposed on a reference protein structure [7]. Here we propose a similarly probabilistic, but otherwise distinct, approach that differs in utilising the output of any suitable geometric pocket prediction program, including PASS, LIGSITE, fpocket and SiteMap [28], and maps each set of predictions to its corresponding protein structure before calculating overall probability densities. We present a method that provides standardised visual Ashford et al. BMC Bioinformatics 2012, 13:39 http://www.biomedcentral.com/1471-2105/13/39 Page 2 of 16
منابع مشابه
Comparative modelling of 3D-structure of Geobacter sp. M21 (a metal reducing bacteria) Mn-Fe superoxide dismutase and its binding properties with bisphenol-A, aminotriazole and ethylene-diurea
Superoxide dismutase play important roles in iron-respiratory bacteria such as Geobacteraceae as an antioxidant defense, and probably an effective enzyme of electron transfer network. Regarding the application of iron-respiratory bacteria in environmental biotechnology particularly biodegradation and bioremediation, understanding the mechanism of inhibition/induction of superoxide dismutase by ...
متن کاملPrediction of 3D protein Structure based on Mutation of AKAP3 and PLOD3 Gene in Case of Non-Obstructive Azoospermia
Background: The present study has been designed with the aim of evaluating A-kinase anchoring proteins 3 (AKAP3)and Procollagen-Lysine, 2-Oxoglutarate 5-Dioxygenase 3 (PLOD3) gene mutations and prediction of 3D proteinstructure for ligand binding activity in the cases of non-obstructive azoospermic male.Materials and Methods: Clinically diagnosed cases of non-obstructive azoos...
متن کاملA Novel Geometric Algorithm for Protein Pocket Extraction, Quantification And Visualization
Molecular surfaces of proteins and other bio-molecules, while modeled as smooth analytic interfaces separating the molecule from solvent, often contain a number of pockets, holes and interconnected tunnels, aka molecular features in contact with the solvent. Several of these molecular features are biochemically significant as pockets are often active sites for ligand binding or enzymatic reacti...
متن کاملBioinformatics prediction and experimental validation of VH antibody fragment interacting with Neisseria meningitidis factor H binding protein
Objective(s): We previously conducted an in silico research on the interactions between the ribosome display-selected single chain variable fragment (scFv) and factor H binding protein (fHbp) of Neisseria meningitidis. We found that heavy chain variable (VH) fragment of this scFv had considerable affinity to fHbp. These results led us to evaluate the ability of this sm...
متن کاملDetermination of Superficial Clefts on Fragment of Antigen Binding in Human Immunoglobulin G by Computational Immunology
Background: Immunoglobulins (Igs) are protective glycoproteins specifically identify and eradicate microbes. Fragment of antigen binding (Fab) is a portion of antibody which binds to antigen and consists of one variable and one constant domain of one heavy and one light chain. Idiotypes, epitopes situated on Igs variable region, could be exploited to monitor and target malignant B cells and are...
متن کامل